A NEW APPROACH TO STRONG EMBEDDINGS 



SOURAV CHATTERJEE 

Abstract. We revisit strong approximation theory from a new per- 
spective, culminating in a proof of the Komlos-Major-Tusnady embed- 
ding theorem for the simple random walk. The proof is almost entirely 
based on a series of soft arguments and easy inequalities. The new tech- 
nique, inspired by Stein's method of normal approximation, is applicable 
to any setting where Stein's method works. In particular, one can hope 
to take it beyond sums of independent random variables. 



1. Introduction 

Let £i,£2,--- be i.i.d. random variables with E(ei) = and E(ei) = 1. 
For each k, let 

k 

Sk = ^ £i- 

i=l 

Suppose we want to construct a standard Brownian motion {Bt)t>o on the 
same probability space so as to minimize the growth rate of 

(1) max \Sk - Bk\. 

l<k<n 

Since Sn and Bn both grow like ^/n, one would typically like to have the 
above quantity growing like o(^/n), and preferably, as slowly as possible. 
This is the classical problem of coupling a random walk with a Brownian 
motion, usually called an 'embedding problem' because the most common 
approach is to start with a Brownian motion and somehow extract the ran- 
dom walk as a process embedded in the Brownian motion. 

The study of such embeddings began with the works of Skorohod ^191 [20] 
and Strassen [22], who showed that under the condition IE(e|) < oo, it 
is possible to make ([T|) grow like n"'^/^(logn)-'^/^(loglogn)-'^/^. In fact, this 
was shown to be the best possible rate under the finite fourth moment 
assumption by Kiefer [12j . 

For a long time, this remained the best available result in spite of numer- 
ous efforts by a formidable list of authors to improve on Skorohod's idea. 
For a detailed account of these activities, see the comprehensive recent sur- 
vey of Obloj [16] and the bibliography of the monograph by Csorgo and 
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Revesz [6]. Therefore it came as a great surprise when Komlos, Major, and 
Tusnady [13], almost fifteen years after Skorohod's original work, proved by 
a completely different argument that one can actually have 

max\Sk — Bk\ = O(logn) 

k<n 

when El has a finite moment generating function in a neighborhood of zero. 
Moreover, they showed that this is the best possible result that one can hope 
for in this situation. 

Theorem 1.1 (Komlos-Maj or- Tusnady |13)). Let ei,e2, ■ ■ ■ be i.i.d. random 
variables with E(ei) = 0, E(ei) = 1, and Eexp0|ei| < oo for some 9 > 0. 
For each k, let := X^fLi^j- Then for any n, it is possible to construct 
a version of {Sk)o<k<n o.f^d a standard Brownian motion {Bt)o<t<n on the 
same probability space such that for all x > 0, 

P(max|S'fc - Bk\ > Clogn + x) < Ke'^"^, 

where C , K , and A do not depend on n. 

The paper [13J also contains another very important result, a similar em- 
bedding theorem for uniform empirical processes. However, this will not be 
discussed in this article. See the recent articles by Mason [15] and Csorgo [1] 
as well as the book [5] for more on the KMT embedding theorem for empir- 
ical procceses. 

One problem with the proof of Theorem II. H besides being technically 
difficult, is that it is very hard to generalize. Indeed, even the most basic 
extension to the case of non-identically distributed summands by Sakha- 
nenko [1^ is so complex that some researchers are hesitant to use it (see 
also Shao |18j). A nearly optimal multivariate version of the KMT theorem 
was proved by Einmahl [10] ; the optimal result was obtained by Zaitsev [23] 
at the end of an extraordinary amount of hard work. More recently, Zaitsev 
has established multivariate versions of Sakhanenko's theorem [251 |25l |26] . 
For further details and references, let us refer to the survey article by Zait- 
sev [27] in the Proceedings of the ICM 2002. 

The investigation in this paper is targeted towards a more conceptual 
understanding of the problem that may allow one to go beyond sums of in- 
dependent random variables. It begins with the following abstract method 
of coupling an arbitrary random variable W with a Gaussian random vari- 
able Z so that W — Z has exponentially decaying tails at the appropriate 
scale. (Such a coupling will henceforth be called a strong coupling, to dis- 
tinguish it from the 'weak' couplings given by bounds on total variation or 
Wasser stein metrics.) 

Theorem 1.2. Suppose W is a random variable with E(VF) = and finite 
second moment. Let T be another random variable, defined on the same 
probability space as W, such that whenever (p is a Lipschitz function and if' 
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is a derivative of (f a.e., we have 

(2) E{Wip{W)) = E{ip'{W)T). 

Suppose \T\ is almost surely bounded by a constant. Then, given any > 0, 
we can construct Z ~ A^(0,cr^) on the same probability space such that for 
any 6 £R, 

Eexp(e|Ty- Z|) < 2Eexpf ^— ^ ^ j. 

Let us make a definition here, for the sake of convenience. Whenever 
{W,T) is a pair of random variables satisfying ([2|), we wih say that T is a 
Stein coefficient for W. 

The key idea, inspired by Stein's method of normal approximation [21j, 
is that if T ~ fj^ with high probability, then one can expect that W is 
approximately Gaussian with mean zero and variance cr^. This conclusion 
is heuristically justified because a random variable Z follows the A^(0, cr^) 
distribution if and only if ¥i{Zip{Z)) = a'^K(ip' (Z)) for all continuously dif- 
ferentiable f such that E|(/9'(Z)| < oo. Stein's method is a process of getting 
rigorous bounds out of this heuristic. 

However, classical Stein's method can only give bounds on quantities like 

sup|E/(VF)-E/(Z)|, 

for various classes of functions J-. This includes, for example, bounds on the 
total variation distance and the Wasserstein distance, and the Berry-Esseen 
bounds. Theorem 11.21 seems to be of a fundamentally different nature. 

To see how Stein coefficients can be constructed in a large array of situ- 
ations, let us consider a few examples. 

Example 1. Suppose X is a random variable with K{X) = 0, E(X^) < 
oo, and following a density p that is positive on an interval (bounded or 
unbounded) and zero outside. Let 

(3) h{x) := HlP^ 

p{x) 

on the support of p. Then, assuming ideal conditions and applying inte- 
gration by parts, we have '&{Xip{X)) = '&{^p' [X)h{X)) for all Lipschitz (p. 
Thus, h{X) is a Stein coefficient for X. The above computation is carried 
out more precisely in Lemma 12.31 in Section [2j 

Example 2. Suppose Xi, . . . , X^ are i.i.d. copies of the random variable X 
from the above example, and let W = X^iLi Then by Example 1, 

1 " 

E{Wip{W)) = ^y2E{XMW)) 

n . n V 

n ^-^ \ n ^-^ I 

1=1 ^ 1=1 ' 
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Thus, ^ Yli h{Xi) is a Stein coefficient for W . Note that this becomes more 
and more Hke a constant as n increases, and so we can use Theorem 11.21 to 
get more and more accurate couphngs. 

Example 3. Suppose ei,...,e„ are i.i.d. symmetric itl-valued r.v. Let 
Sn = Er=i ^i- Let Y ~ Uniform[-1, 1]. Let W„ = 5„ + Y. Let 

l_y2 

r„ = n-5„y + ^— . 

It will be shown in the proof of Theorem 13.11 in Section [3] that T„ is a Stein 
coefficient for Wn- (The construction of this T„ is somewhat ad hoc. The 
author has not yet found a general technique for smoothening of discrete 
random variables in a way that can automatically generate a Stein coeffi- 
cient.) Letting cr^ = n, Lemma [L2] tells us that it is possible to construct 
Zn ~ -A^(0, n) such that 

Eexp(6'|W„ - Zn\) < 2 Eexp 

Since = n + 0{y/n) and \Wn — Sn\ < 1, it is now clear how to use 
Theorem 11.21 to construct Sn and Zn on the same probability space such 
that irrespective of n, 

Eexp(0|5„-Z„|) < C 
for some fixed constants 9 and C. By Markov's inequality, for all x > 0, 

n\Sn-Zn\>x)<Ce-'^. 

This is the first step in our proof of the KMT embedding theorem for the 
simple random walk. 

Example 4. Suppose X = {Xi, . . . , X„) is a vector of i.i.d. standard Gauss- 
ian random variables. Let W = /(X), where / is absolutely continuous. 
Suppose E(H^) = 0. Let X' = {X[, . . . be an independent copy of X. 
Let 

Jo 2Vi^^dxi^ 'dxi^ 

Then one can show that T is a Stein coefficient for W (see [3], Lemma 5.3). 
This has been used to prove CLTs for linear statistics of eigenvalues of 
random matrices f3|. 

Example 5. Theorem 11.21 can be used to construct strong couplings for 
sums of dependent random variables. An example of such a result is the 
following. 

Theorem 1.3. Suppose Xi, . . . , Xn, Xn+i are i.i.d. random variables with 
mean zero, variance 1, and probability density p. Suppose p is bounded above 
and below by positive constants on a compact interval, and zero outside. Let 
Sn '■= Y17=i Then it is possible to construct Sn and a Gaussian 
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random variable ~ -/V(0, n) on the same probability space such that for 
all X > 0, 

n\Sn-Zn\>x)< e-^('')^ 

where C{p) is a positive constant depending only on the density p (and not 
on n). 

The process {Sn}i upon proper scaling, is sometimes called the 'autocor- 
relation process' for the sequence It may be possible to use the above 
result to prove a KMT type coupling for autocorrelation processes. The 
proof of Theorem 11.31 is short enough to be presented right here. 

Proof of Theorem \1.3[ Let Xq = 0. Let h be defined as in ([3|). Then note 
that for any if, the definition of h and Example 1 show that 

n 

i=l 
n 

= Y,HX^+l{X,.l + Xi+i)hiXi)^'{Sn)). 
i=l 

This shows that if 

Di := h{Xi)Xi^i{Xi^i 

then T„ := Y17=i ^ Stein coefficient for Sn- Now, for any 1 < i < n, 

E(A - 1 I Xi, . . . , Xi^i) = Eih{Xi))E{Xf_^^) -1 = 0, 

since E{h{Xi)) = E(Xf) = 1. Moreover it is easy to show that by the 
assumed conditions on p that \Di\ is almost surely bounded by a constant 
depending on p. Therefore by the Azuma-Hoeffding inequality |1H [T] for 
sums of bounded martingale differences, we get that for each a G M, 

where Ci{p) is some constant depending only on p. Thus if Z is a standard 
Gaussian random variable, independent of all else, then for any a G M 

Therefore choosing a = C2{p) small enough, one gets 

]E(e'^2{p)^(T„-n)/v^-) < 2. 

On the other hand, first conditioning on T„ we get 

E(gC2{p)Z(T„-n)/v/;^^j ^ ]g(gC2(p)2{T„-n)2/2n^^ 

By Theorem 11.21 this completes the proof. □ 

Sketch of the proof of Theorem ll.2l (Full details are given in Section[2l) 
First, let h{W) := E{T\W). Then h{W) is again a Stein coefficient for W. 
Moreover, one can show that the function h is non-negative a.e. on the 
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support oi W. It is not difficult to verify that to prove Theorem 11.21 it 
suffices to construct a coupling such that for all 9, 

Eex.-p{e\W - Z\) < 2 Eexp(2e2(V^(Ty) - af). 

Fix a function r : ^ M. For / G C^{R'^), let 

mx,y) ■■= h{x)— + 2r{x,y)^ + a^—^ - x- - y-. 

Suppose there exists a probability measure [i on such that for all /, 

(4) / £/d/i = 0. 



The main idea is as follows: every choice of r that admits a /U satisfying ([4]) 
gives a coupling of W and Z . Indeed, suppose [i is as above and (X, Y) is 
a random vector with law ii. Take any G C'^(M), and let (/? = <!>'. Putting 
f{x,y) = <&(x) in gives 

E(/i(X)v?'(X)) = E{Xip{X)). 

Since this holds for all (p (which is a property that characterizes W) it is 
possible to argue that X must have the same law as W. Similarly, putting 
f{x,y) = we get E{Yip{Y)) = a^E{ip'{Y)), and thus, Y ~ N{0,a^). 

Note that this argument did not depend on the choice of r at all, except 
through the assumption that there exists a n satisfying 1^. 

Now the question is, for what choices of r does there exist a /i satisfy- 
ing ([5])? In Lemma [2. II it is proved that this is possible whenever the matrix 

h(x) r{x,y) 
r{x,y) cj^ 

is positive semidefinite for all plus some extra conditions. Note that 

this is the same as saying that the operator C is elliptic. 

Intuitively, the 'best' coupling of W and Z is obtained when the choice of 
r{x,y) is such that the matrix displayed above is the 'most singular'. This 
choice is given by the geometric mean 



r{x,y) = (j^/h{x). 

With this choice of r and f{x,y) = ^{x — y)^^ (where k is an arbitrary 
positive integer), a small computation gives 

£/(x, y) = {2k -l){x- yf^-\./h{x) - af - {x - yf\ 

Since (|4]) holds for this /, we get 

E{X - Yf" = {2k - l)E{{X - Yf^-'^{^/h{X) - af) 

< {2k - 1){E{X - y)2'=)(^-i)/^-(IE(\/Rx) - af^ f/^. 

This gives 

E{X - Yf^ < {2k - 1)'=E(0I(X) - afK 
It is now easy to complete the proof by combining over k > 1. 
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The KMT theorem for the SRW. As an application of Theorem 11.21 
we give a new proof of Theorem 11.11 for the simple random walk. Although 
this is just a special case of the full theorem, it is important in its own 
right due to the importance of the SRW in various areas of science and 
mathematics. For instance, within the last ten years, the KMT embedding 
for the SRW played a pivotal role in the solution of a series of long-standing 
open questions about the simple random walk by the quartet of authors 
Dembo, Peres, Rosen, and Zeitouni ^i8j. 

The proof of the KMT theorem for the SRW is obtained using a combina- 
tion of Theorem 1 1.21 Example 3, and an induction argument. The induction 
step involves proving the following theorem about sums of exchangeable 
binary variables. This seems to be a new result. 

Theorem 1.4. There exist positive universal constants C , K and Aq such 
that the following is true. Take any integer n > 2. Suppose ei, . . . ,e„ are 
exchangeable ±1 random variables. For k = 0,1, ... ,n, let Sk = '}2i=i 
and let Wk = Sk — ^Sn- It is possible to construct a version of Wq, . . . , Wn 
and a standard Brownian bridge {Bt)o<t<i on the same probability space 
such that for any < A < Aq, 



Note that by Example 2, it is possible to use Theorem 11.21 and induction 
whenever the summands have a density with respect to Lebesgue measure 
and the function h is reasonably well-behaved. This holds, for instance, for 
log-concave densities, or densities of the type considered in Theorem 11.31 
In such cases it is not very difficult (although technically messier than the 
binary case) to prove a version of Theorem 11.41 using the method of this 
paper. However, we do not know yet how to use Theorem 11.21 to prove 
the KMT theorem in its full generality, because we do not know how to 
generalize the smoothing technique of Example 3. 

The theorem that we prove about the KMT coupling for the SRW, stated 
below, is somewhat stronger than existing results. 

Theorem 1.5. Let ei,e2,... be i.i.d. symmetric zizl-valued random vari- 
ables. For each k, let := Yli=i^i- possible to construct a version of 
the sequence {Sk)k>o ^iT-d a standard Brownian motion {Bt)t>o on the same 
probability space such that for all n and all x >0, 



Eexp(Amax iVFfc — \/ni?fc/„|) < exp(Clogn)Eexp 



k<n 




P(max|5'fc - Bk\ > Clogn + x) < Ke 



k<n 



where C , K , and A do not depend on n. 



The above result is stronger than the corresponding statement about the 
SRW implied by Theorem 11.11 because it gives a single coupling for the 
whole process, instead of giving different couplings for different n. Such 
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results have been recently established in the KMT theorem for summands 
with finite pth moment [141 [28] . 

The paper is organized as follows. In Section [21 we prove Theorem 11.21 
Two versions of Example 3 are worked out in Section [31 The main induction 
step, which proves Theorem ll.41 is carried out in Section [H Finally, the proof 
of Theorem 11.51 is completed in Section \5\ 

2. Proof of Theorem 11.21 

The proof will proceed as a sequence of lemmas. The lemmas will not be 
used in the subsequent sections, and only Theorem 11.21 is relevant for the 
future steps. 

Lemma 2.1. Let n be a positive integer, and suppose A is a continuous 
map from into the set of n x n positive semidefinite matrices. Suppose 
there exists a constant b > such that for all x S M", 

\\A{x)\\<b. 

Then there exists a probability measure fi on M" such that if X is a random 
vector following the law fi, then 

(5) Eexp{e,X) <e^p{b\\ef) 
for all e e M", and 

(6) E{X, V/(X)> = E Tr{A{X) Hess f{X)) 

for all f E C2(M") such that the expectations E\f{X)\^, E||V/(X)||2, and 
E| Tr(^(X) Hess /(X))| are finite. Here's/ f and Hess f denote the gradient 
and Hessian of f , and Tr stands for the trace of a matrix. 

Proof. Let K denote the set of all probability measures /i on M" satisfying 

j xii{dx) = and j exp(6', < exp(6||6'f ) for all 6 £ M'". 

It is easy to see by the Skorokhod representation theorem and Fatou's lemma 
that K is a (nonempty) compact subset of the space V of all finite signed 
measures on M" equipped with the topology of weak-* convergence (that is, 
the locally convex Hausdorff topology generated by the separating family 
of seminorms := \ J fdfi\, where / ranges over all continuous functions 
with compact support). Also, obviously, K is convex. 

Now fix e G (0, 1). Define a map : K ^ V as follows. Given ^ £ K, let 
X and Z be two independent random vectors, defined on some probability 
space, with X ~ and Z following the standard gaussian law on M"'. Let 
Tf^n be the law of the random vector 



(1 - e)X + ^j2eA{X)Z, 
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where A(X) denotes the positive semidefinite square root of the matrix 
A{X). Then for any 6 £ M", 

J exp {6, x)Tsfi{dx) = Eexp(6', (1 - e)X + y'2eA{X)Z) 

= Ee^p{{9, {1 - e)X) + e{9, A{X)9)) 

< ex.-p{be\\ef)Eexp{e, (1 - e)X) 

< exp{be\\ef + b{l-ef\\9f). 

For e € (0, 1), l — e+e^ < 1. Hence, + — e)^ < b, and therefore maps 
K into K. Since ^ is a continuous map, and the transformation A i— ?■ ^/A is 
continuous (see e.g. [2], page 290, equation (X.2)), it is easy to see that is 
continuous under the weak-* topology. Hence, by the Schauder-Tychonoff 
fixed point theorem for locally convex topological vector spaces (see e.g. 
Chapter V, 10.5), we see that must have a fixed point in K. For each 
£ G (0, 1), let /ie be a fixed point of T^, and let Xe denote a random vector 
following the law //g. 

Now take any / G C^(M") with V/ and Hess / bounded and uniformly 
continuous. Fix e G (0, 1), and let 

Ye = -eXe + ^/2eA{Xe)Z. 
By the definition of Tg^, note that 

(7) E(/(X, + y,)-/(X,)) =0. 
Now let 

Tie = f{X, + Y,)-f{X,) - (y„v/(x,)> - ^(y„Hess/(X,)ye>. 
First, note that 

(8) E{Ye,Vf{X,)) = -eE{X„Vf{X,)). 

By the definition of K, all moments of WX^W are bounded by constants that 
do not depend on e. Hence, as e — )• 0, we have 

(9) 

E(y„Hess/(X,)y,> = 2eETr(yi(xjHess/(X,)7I(xJ) + 0(e3/2) 

= 2£ETi{A{Xe)'tless f{Xe)) + 0{e^/^). 
Now, by the boundedness and uniform continuity of Hess /, one can see that 

\ne\ < \\Y,f6m\\), 

where 6 : [0, oo) — )• [0, oo) is a bounded function satisfying liuit^Q 5{t) = 0. 
Now, by the nature of K, it is easy to verify that the moments of e~^||y^|p 
can be bounded by constants that do not depend on e. Combining this 
with the above-mentioned properties of 6 and the fact that ||1^|| — )• in 
probability as e — )• 0, we get 

(10) lim e-^E|7^J = 0. 
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Now let fi he a cluster point of the collection {/Ue}o<e<i as e — )• 0, and let 
X denote a random variable following the law ^. Such a cluster point exists 
because i^T is a compact set. By uniform integrability, equations ([7]), ([8]), 
(fTO|) - and the continuity of A, we get 



E{X,Vf{X)) =ETr{A{X)Ressf{X)). 

This completes the proof for / G C^(M") with V/ and Hess / bounded and 
uniformly continuous. Next, take any / G C^(M"'). Let (7 : — )• [0, 1] be 
a function such that g{x) = 1 if ||x|| < 1 and g{x) = if ||x|| > 2. 
For each a > 1, let fa{x) = f{x)g{a''^x). Then /„ G with Vfa and 
Hess /a bounded and uniformly continuous. Moreover, fa and its derivatives 
converge pointwise to those of / as a — )• 00, as is seen from the expressions 

- {x)g{a x) + a -^{x)- — (a x) 



dxidxj dxidxj dxi dxj 

Since E||X|p < 00 and ||yl(x)|| < 6, the above expressions also show that 
if the expectations E|/(X)|2, E||V/(X)||2, and E| Tr(A(X) Hess /(X))| are 
finite, then we can apply the dominated convergence theorem to conclude 
that 

hm E(X,V/„(X)> =E(X,V/(X)> and 

lim ETr(yl(X)Hess/a(X)) = ETr(^(X) Hess /(X)). 

a— >oo 

This completes the proof. □ 

Lemma 2.2. Let A and X be as in Lemma [2.11 Take any i < i < j < n. 

Let 

Vij{x) := aii{x) + ajj{x) - 2aij{x), 
where aij denotes the {i,j)th element of A. Then for all G M, 

Eexp(6'|Xi -Xj\)< 2Eexp(26l2t;,j-(X)). 
Proof. Take any positive integer k. Define / : M" — ?• M as 

f{x) := {xi-Xjf^. 
Then a simple calculation shows that 

(x,V/(x)> = 2fe(xi-x,-)'', 

and 

Tr(^(x) Hess/(x)) = 2A:(2A; - l){x^ - Xjf''-^v^j{x). 
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The positive definiteness of A shows that Vij is everywhere nonnegative. An 
apphcation of Holder's inequahty now gives 

E|Tr(^(X)Hess/(X))| < 2k{2k - 1){E{X, - Xjf'')^ {Evij{X)'')K 
From the identity ([6]) we can now conclude that 

E{X, - X.f' < {2k - l)[E{Xi - Xjf'')'-^ {Evij{Xf)-K 
This shows that 

E{Xi - Xjf" < {2k - l)^Evij{X)\ 
To complete the proof, note that 

Eexp{e\Xi - Xj\) < 2Ecosh(6l(Xi - Xj)) 

°° {2k-l)''e^''E{vij{X)'') 



+ (2.)! 



k=l 

By the slightly crude but easy inequality 

{2k - 1)'= 2^= 
(2A;)! - kl' 

the proof is done. □ 

Lemma 2.3. Suppose p is a probability density function on M which is 
positive on an interval (bounded or unbounded) and zero outside. Suppose 
xp{x)dx = 0. For each x in the support of p, let 

h{x) := — . 

p{x) 

Outside the support, let h = 0. Let X be a random variable with density p 
and finite second moment. Then 

(11) E{Xip{X))=E{h{X)^'{X)) 

for each absolutely continuous ip such that both sides are well defined and 
E\h{X)'p{X)\ < oo. Moreover, if hi is another function satisfying (jlip for 
all Lipschitz ip, then hi = h a.e. on the support of p. 

Conversely, if Y is a random variable such that (jlip holds with Y in 
place of X, for all (/? such that \ip{x)\, 1x99(2;) |, and \ h{x)ip' {x)\ are uniformly 
bounded, then Y must have the density p. 

Proof. Let u{x) = h{x)p{x). Note that u is continuous, positive on the 
support of p, and lima;_^_oo u{x) = lim3;_5.oo u{x) = since 

u{x) = / yp{y)dy = - yp{y)dy. 
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Note that the above identity holds because xp{x)dx = 0. Again, by the 
assumption that E(X^) < c«, it is easy to verify that 

/oo 
u{x)dx = E(X2) < oo. 
-oo 

When 99 is a bounded Lipschitz function, then (jlip is just the integration 
by parts identity 

foo /•oo 

X(p{x)p{x)dx = / ip' {x)u{x)dx. 



Now take any absolutely continuous ip and a map 51 : M — t- [0, 1] such 
that g{x) = 1 on [—1, 1] and g{x) = outside [—2, 2]. For each a > 1, let 

ipa{x) := ip{x)g{a'^x). 

Then 

v'ai.^) = ^'{x)g{a~^x) + a~^ip{x)g'{a~^x). 

It is easy to see that ipa and are bounded, and they converge to cp and 
ip' pointwise as a — )• 00. Moreover, |xv3a(a^)| < and 

\h{x)ip'^{x)\ < |/i(x)v3'(x)| +a"^||5('||oo|/i(2;)v9(x)|. 

Since we have assumed that E|X(^(X) I, E\h{X)p' {X)\, a.ndE\h{X)p{X)\ are 
finite, we can now apply the dominated convergence theorem to conclude 
that ([nj holds for ip. 

Suppose hi is another function satisfying (jlip for all Lipschitz ip and 
E(X^) < 00. Let ip{x) be a Lipschitz function such that ip'{x) = sign(/ii(x) — 
h{x)). Then 

= E(v9'(X)(/ii(X) - = E|/ii(X) - h{X)\. 

This shows that hi = h a.e. on the support of p. 

For the converse, let X have density p and take any bounded continuous 
function : R — ?■ M, let m = Eu (X), and define 

:= — r- / p{y){v{y) - m)dy = / p{y)(v{y) - m)dy 

on the support of p. Since u is nonzero and absolutely continuous everywhere 
on the support of p, therefore ip is well-defined and absolutely continuous. 
Next, we prove that is uniformly bounded. If x > 0, then 



\xip{x)\ 



u{x) 



piy){v{y) - m)dy 



^ tt% / yp{y)dy = '^\\v\ 



\U{X) 

Similarly, the same bound holds for x < 0. A direct verification shows that 
h{x)p {x) — xip{x) = v{x) — m. 



A NEW APPROACH TO STRONG EMBEDDINGS 



13 



Thus, \h{x)ip' {x)\ is uniformly bounded. Finally, by the continuity of if, 
< sup|j|<]^ 19^(^)1 + is also uniformly bounded. 

So, if y is a random variable such that (jlip holds for Y in place of X and 
every ip such that |x99(x)|, and \h{x)ip' {x)\ are uniformly bounded, 

then 

Kv(Y) - Ev{X) = E{v(Y) - m) = E{h{Y)ip'{Y) - Yip{Y)) = 0. 
Thus, Y must have the same distribution as X. □ 

Proof of Theorem \1.2[ First, assume W has a density p with respect to 
Lebesgue measure which is positive and continuous everywhere. Define h in 
terms of p as in the statement of Lemma 12.31 Then by the second assertion 
of Lemma 12.31 

h{w) = K{T\W = w) a.s. 

Note that h is nonnegative by definition. So we can define a function A from 
into the set of 2 x 2 positive semidefinite matrices as 



A{xi,X2) := 




Note that A{xi,X2) does not depend on X2 at all. It is easy to see that A is 
positive semidefinite. Also, since p is assumed to be continuous, therefore so 
are h and A. Since T is bounded by a constant, so is h. Let X = (Xi, X2) be 
a random vector satisfying ([5]) and ^ of Lemma |2 . 1 1 with this A. Take any 
absolutely continuous 99 : R — M such that |x(/9(rc)|, and \h{x)if' {x)\ 

are uniformly bounded. Let ^ denote an antiderivative of (p, i.e. a function 
such that = ip. We can assume that <I)(0) = 0. Define / : — > R as 
f{xi,X2) := ^{xi). Then for some constant C, for all xi,X2, 

|/(X1,X2)| <C|xi|, ||V/(X1,X2)|| <C, 

and I Tr(^(xi, X2) Hess /(xi, X2))| < C. 
Thus, we can apply Lemma |2. II to conclude that for this /, 

E(X,V/(X)> =ETr(^(X)Hess/(X)), 

which can be written as 

E{XMXi)) = E{hiXi)ip'iXi)). 

Since this holds for all ip such that |x(/?(x)|, and \h{x)ip' {x)\ are uni- 

formly bounded, Lemma [213] tells us that Xi must have the same distribution 
as W. 

Similarly, taking any ip such that |(/5(x)|, |x99(x)|, and are uniformly 

bounded, letting <I> be an antiderivative of ip, and putting /(xi, X2) = *I'(x2), 
we see that 

E{X2ipiX2)) = a^E{ip\X2)), 
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which imphes that X2 ~ -/V(0, a^). We now wish to apply Lemma [2.2l to the 
pair {Xi,X2). Note that 



wi2(xi, X2) = h{xi) + o-^ - 2ayjh{xi) = {yjh{xi) - a) 
Since h{xi) > 0, we have 




Since h{Xi) has the same distribution as h{W), and h{W) = E(r|VF), 
the required bound can now be obtained using Lemma 12.21 and Jensen's 
inequahty. 

So we have finished the proof when W has a probability density p with 
respect to Lebesgue measure which is positive and continuous everywhere. 
Let us now drop that assumption, but keep all others. For each e > 0, 
let VFg := W + eY, where Y is an independent standard gaussian random 
variable. If u denotes the law of W on the real line, then We has the 
probability density function 



Prom the above representation, it is easy to deduce that is positive and 
continuous everywhere. Again, note that for any Lipschitz 93, 



(Note that in the second step, we required that ([2]) holds for any derivative 
of if instead of just one.) Thus, by what we have already proved, we can 
construct a version of Ws and a A^(0, o"^ +e^) r.v. on the same probability 
space such that for all 6, 



Let ps be the law of the pair {Wg, Z^) on M^. Clearly, {pe}e>o is a tight 
family. Let po be a cluster point as e — )■ 0, and let {Wo,Zq) ~ ^o- Then 
Wq has the same distribution as W, and Zq ~ A^(0, cj^). By the Skorokhod 
representation, Fatou's lemma, and the monotone convergence theorem, it 
is clear that 




E{Weip{We)) = E{Wip{W + eY)) + eE{Yip{W + eY)) 
= E{Tip'{W + eY)) + e'^E{ip'{W + eY)) 
= E{{T + e^)ip'{We)). 




Eex.p{e\Wo - Zo\) < limmiEexp{6\We - Ze\) < 2Eexp 




This completes the proof. 



□ 
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3. Elaborations on Example 3 

The goal of this section is to prove the fohowing two theorems. The first 
one is simply Example 3 from Section [TJ The second one can be called a 
conditional version of the same thing (which is harder to prove). 

Theorem 3.1. There exist universal constants k and 6q > {} such that the 
following is true. Let n be a positive integer and let £i, . . . ,£n be i.i.d. sym- 
metric ±1 random variables. Let 5„ = Y17=i^i- possible to construct a 
version of 5„ and Zn ~ n) on the same probability space such that 

Eexp(9o\Sn - Zn\) < n. 

Note that by Markov's inequality, this implies exponentially decaying tails 
for \Sn — Zn\, with a rate of decay that does not depend on n. 

Theorem 3.2. Let ei,...,e„ be n arbitrary elements o/{— 1,1}. Let vr 
be a uniform random permutation of {1, . . . For each I < k < n, let 

= T.e=i ^nii), and let 



n 

There exist universal constants c > 1 and > satisfying the following. 
Take any n > 3, any possible value of Sn, and any n/3 < k < 2n/3. It 
is possible to construct a version of Wk and a gaussian random variable Z^ 
with mean and variance k{n — k)/n on the same probability space such that 
for any 9 < Oq, 

Eexp(6'|W^fc - Zk\) < exp 1 + ^ 

\ n 

Both of the above theorems will be proved using Theorem ll.2[ We proceed 
as before in a sequence of lemmas that are otherwise irrelevant for the rest 
of the manuscript (except Lemma 13.51 which has an important application 
later on). 

Lemma 3.3. Suppose X and Y are two independent random variables, with 
X following the symmetric distribution on { — 1, 1} and Y following the uni- 
form distribution on [—1, 1]. Then for any Lipschitz ip, we have 

E{X^{X + Y)) = E((l - XY)^\X + y)), 

and 

E{Y^{X + Y)) = iE((l - Y^)^'{X + Y)). 

Proof. We have 

E((l - XY)^'{X + Y)) = ^ y\l + y)^'{-l + y)dy 



1 

+ 47 ^{^-yW{^ + y)dy- 
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Integrating by parts, we see that 

y ^(1 + yW{-l + y)dy = 2(^(0) - j ^ ^{-l + y)dy, 

and 

j {I- y)^'{l + y)dy = -2<^(0) + J ^^(.^ + y)dy- 
Adding up, we get 

E((l - XYW{X + Y)) = -^ ^{1 + y)dy - ^ j'^ <^(-l + y)dy 

= E{Xip{X + Y)). 

For tlie second part, just observe tliat for any x, integration by parts gives 

1 /"i 1 1-y'^ 

yip{x + y)dy = - — - — ip'{x + y)dy. 



2 7_i 2 
This completes the proof. □ 

Proof of Theorem \'S.l[ For simphcity, let us write S for 5„. Let y be a 
random variable independent of ei , . . . , e„ and uniformly distributed on the 
interval [—1, 1]. Suppose we are given the values of ei, . . . ,en-i- Let 
denote the conditional expectation given this information. Let 

n-l 

S = ^ ^ £i, X = En- 

i=l 

Then Lemma 13.31 gives 

E'iX^{S~ +X + Y))= E-((l - XY)^\S + Y)) 

= E-{{l-enY)ip'{S + Y)). 
Taking expectation on both sides we get 

E{en^{S + Y)) = E((l - enY)ip'{S + Y)). 
By symmetry, this gives 

E(5<^(5 + Y)) = E((n - SY)^'{S + Y)). 
Again, by Lemma 13.31 we have 

E{Y^{S + Y)) = iE((l - Y^)^'{S + Y)). 
Thus, putting S = S + Y and 

l_y2 



T = n-SY + 

2 

we have 

(12) E(§^(5))=E(V(^)). 
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Let (j^ = n. Then 

a"^ ~ n 

Now, clearly, K(S) = and E(S'^) < oo. The equation (jl2p holds and the 
random variable T is a.s. bounded. Therefore, all conditions for applying 
Theorem 11.21 to S are met, and hence we can conclude that it is possible 
to construct a version of S and a A^(0, o"^) random variable Z on the same 
space such that for all 9, 

Eex.p{9\S - Z\) < 2Eexp(2eV-2(r-c72)2). 

Since the value of S is determined if we know S, we can now construct a 
version of S on the same probability space satisfying jS" — S"] < 1. It follows 
that 

Eexp(0|5 - Z\) < 2Eexp(|0| + 20V-2(r - a'^ f). 

Using the bound on (T — (T^)^/o"^ obtained above, we have 

Eexp(e|5 - Z\) < 2exp(|0| + 9"^ /n)Eexp{Ae^ S'^ /n). 

To complete the argument, note that if y is a standard gaussian r.v., inde- 
pendent of 5, then 

Eex.-p{Ae^S^/n) = Eexp(\/80y5/V^) 

= E(E(exp(V80yei/V^)|y)") 
= E(cosh"(^/80y/V^)). 

Using the simple inequality cosh x < exp , this gives 

(13) Eexp(4^2^V^) < Eexp(8^V2) = ^ if 169^ < 1. 

yl — 160^ 

The conclusion now follows by choosing 9q sufficiently small. □ 

Lemma 3.4. Let all notation be as in the statement of Theorem \3.2[ Then 
for any 9 & R and any 1 < k < n, we have 

Eexp{9Wk/Vk) < exp6l^ 

Remark. Note that the bound does not depend on the value of 5„. This is 
crucial for the next lemma and the induction step later on. Heuristically, 
this phenomenon is not mysterious because the centered process {Wk)k<n 
has maximum freedom to fluctuate when Sn = 0- 

Proof Fix k, and let m{9) := Eex.p{9Wk/Vk). Since Wk is a bounded 
random variable, there is no problem in showing that m is differentiable and 

m{9) = -^E{Wkexp{9Wk/Vk)). 
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Now note that 

k 



1=1 j=fe+i 



(n - A;) X;^^! £7r{i) - ^(-Sn - EjLl £7r(i)) 



y] e7r(i) ^ = W^fe 

i=l 



Thus, 



^ k n 

(14) m'(0) = ^ E E ^((^-« - £.0)) exp(0t^fe/Vfc)). 

Now fix i < < j. Let vr' = vr o (i, j), so that '/r'(i) = 7r(j) and '/r'(j) = '7r(i). 
Then vr' is again uniformly distributed on the set of all permutations of 
{1, . . . ,n}. Moreover, (tt, tt') is an exchangeable •pair of random variables. 
Let 



Then 

- £^(,-))exp(^VFfe/V^)) = E((£^,(,) - £^,(,-))exp(^W^^/V^)) 

= E((£^(,) - £^(,)) exp(0WiJ/v/fc)). 
Averaging the two equal quantities, we get 
E((e^(i) - e^(j))exp(^VF'fe/\/fc)) 

= ^E((£,(i) - £,(,))(exp(0W^fe/v^) - exp(^T^^/Vfc))). 
Thus, from the inequality 

V -&y\ < ^|x-y|(e^ + e^) 
and the fact that — Wj^ = £^(j) — we get 
|E((£^(,)-£^(,-))exp(0W^fc/V^))| 

< lLE{ie^^i)-e^(^j)f{eMOWk/Vk) + eMOWUVk))) 
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Using this estimate in (fH|l , we get 



9 1 I 

\m'(e)\ < y m(9) <2\9\m(9). 

nk ^-^ ^-^ 

i=\ j=A;+l 

Using that m(0) = 1, it is now easy to complete the proof. □ 

Lemma 3.5. Let us continue with the notation of Theorem \3.2[ There exists 
a universal constant ao > such that for all n, all possible values of Sn, cdl 
k such that k < 2n/3, and all a < uq, we have 

Eexp(a5'|/A;) < exp( 1 + 



4n 

Remark. The exact value of the constant 3/4 in the above bound is not 
important; what is important is that the constant is < 1 as long as we take 
k < 2n/3. This is why the induction argument can be carried out in Section 
m However, there is no mystery; the fact that one can always get a constant 
< 1 can be explained via simple heuristic arguments once Lemma 13.41 is 
known. 



Proof. Let Z be an independent standard gaussian random variable. Then 

exp 



Eexpl 



k 



2a 



2a kSn 




n 



Now, by Lemma 13.41 we have 



Thus, we have 



Eexp(aS'|/A:) < Eexp 2aZ'^ + 



Z < exp(2aZ2 



2a kSn 



n 




Z 



Since Sn is nonrandom, the right hand side is just the expectation of a func- 
tion of a standard gaussian random variable, which can be easily computed. 
This gives, for < a < 1/4, 



Eexp(aS'^/A;) < 



VI -4a 



exp 



akS'i 



(1 - 4a)n2 



The lemma is now proved by bounding k by 2n/3 and choosing ao small 
enough to ensure that 1/(1 — 4ao) is sufficiently close to 1. □ 

Proof of Theorem 13.21 For simplicity, we shall write W for Wk and S for Sn , 
but Sk will be written as usual. 

Let y be a random variable independent of vr and uniformly distributed 
on the interval [—1, 1]. Fix 1 < i < k and k < j < n. Suppose we are given 
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the values of {7r(£),£ 7^ hj}- Let E~ denote the conditional expectation 
given this information. Let 

If 5 / S^, then we must have e^(j) = and hence in that case 

^-{{e^(,V)-e^(^j)MW + Y)) = Q. 

Next let us consider the only other possible scenario, S = . Then the 
conditional distribution of £,^(4) — e^(j) is symmetric over {—2,2}. Let 

^ — 2 ~ 

and note that 

w = w- + X. 

Thus, under S = S~ , Lemma 13.31 shows that for all Lipschitz (p, 

E~((e.{i) - e.uMW + Y)) = 2E-{X^{W~ + X + Y)) 

= 2E-i{l- XY)ip'{W + Y)) 
= E-((2 -e,(,))y)v9'(T^ + F)). 

Next, let 

A simple verification shows that 

if £77(1) = ^Tvij) ■ 

Thus, irrespective of whether S = S~ or S ^ , we have 

E-{ie^^,) - e^^^^MW + Y)) = E~ia,j^'iW + Y)). 

Clearly, we can now replace by E in the above expression. Now, as in 
the proof of Lemma 13.41 observe that 

i=i j=k+i 

Combining the last two observations, we have 

E{Wip{W + Y))=E((^Yl E a,^^'{W + Y) 
\ 1=1 j=k+i ^ 



Again, by Lemma 13.31 we have 



E(y^(VF + Y)) = ^E((i - Y'^yiyv + Y)). 
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Thus, putting W = W + Y and 

i=l j=fc+l 

we have 

(15) E{Wip{W)) = E{Tip'{W)). 

Now 

re ^-^ re re 

i=l j=fc+l 

Let CT^ = /i;(re — k)/n. Since re/3 < k < 2re/3 and < + ll^l, a simple 
computation gives 

C7^ fc(re — Kj 

\ k re 

where C is a universal constant. 

Now, clearly, E{W) = and E(1^2) < oo. The equation ([15]) holds and 
the random variable T is a.s. bounded. Therefore, all conditions for applying 
Theorem 11.21 to W are met, and hence we can conclude that it is possible 
to construct a version of W and a A^(0, o"^) random variable Z on the same 
space such that for all 9, 

Eexp{e\W - Z\) < 2Eexp(20V-2(r - a^f). 

Since the value of W is determined if we know W, we can now construct 
a version of W on the same probability space satisfying |Vl^ — Vl^| < 1. It 
follows that 

Eex.p{6\W - Z\) < 2Eexp(|0| +20V~2(r-a2)2). 

Using the bound on (T — cj^)^/o"^ obtained above, we have 

Eexj>{9\W - Z\) < 2exp{\e\ + C9^S^/n + C9^)Eexp{Ce^Sl/k), 

where, again, C is a universal constant. The conclusion now follows from 
Lemma 13.51 bv choosing 9 sufficiently small. □ 

4. The induction step 

The goal of this section is to prove the following theorem, which couples 
a pinned random walk with a Brownian Bridge. The tools used are The- 
orem 13.21 and induction. The induction hypothesis, properly formulated, 
allows us to get rid of the dyadic construction of the usual KMT proofs. 
The following is an alternative statement of Theorem 11.41 given here for the 
convenience of the reader. 



22 



SOURAV CHATTERJEE 



Theorem 4.1. Let us continue with the notation of Theorem 13.21 There 
exist positive universal constants C, K and Aq such that the following is 
true. For any n > 2, and any possible value of Sn, it is possible to construct 
a version ofWo, Wi, . . . , Wn and gaussian r.v. Zq, Zi, . . . , Z„ with mean zero 
and 

(16) Cov(z.,z,) = (iM(!i^(iXl)) 

n 

on the same probability space such that for any A € (0, Aq), 

/ ^2 
Eexp(Amax|VFi - Zj|) < exp( ClognH 

i<n \ n 

Proof. Recall the universal constants ao from Lemma 13.51 and c and from 
Theorem l3.2[ We contend that for carrying out the induction step, it suffices 
to take 



(17) K = 8c, Ao<,/^A^, andC> ^ + ^°^^ 



16c 2 ' - log(3/2) 

Choosing the constants to satisfy these constraints, we will now prove the 
claim by induction on n. Now, for each n, and each possible value a of S'n, 
let /a (s) denote the discrete probability density function of the sequence 
{So, Si, . . . , Sn). Note that this is just the uniform distribution over A^, 
where 

(18) Aa ■= {s G Z"^^ : So = 0, Sn = a, and \si — Sj_i| = 1 for all i.} 
Thus, for any s G A^, 

(19) /r(s) ^ 



Let (/>"(z) denote the probability density function of a gaussian random 
vector (Zq, . . . , Zn) with mean zero and covariance (jl6p . 

We want to show that for each n, and each possible value a Sn, we can 
construct a joint probability density /9"(s,z) on Z""*""*^ x M""''-'^ such that 

(20) I plis, z) dz = /:(s), j pl(s, z) ds = <A"(z), 

and for each A < Aq, 

\ p^{s,z) dsdz < exp f C log n -\ 

Suppose p^ can be constructed for A; = 1, . . . , n — 1, for allowed values of a 
in each case. We will now demonstrate a construction of p^ when a is an 
allowed value for 5„. 

First, fix a possible value a of Sn and an index k such that n/3 < k < 2n/3 
(for definiteness, take k = [n/2]). Given Sn = a, let ga' (s) denote the 



/ expl A max 




ia 


Si - 


Zi 


/ \ 




n 
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density function of Sk- Recall the definition (llSp of and note that for all 
allowed values of s of S^, an elementary counting argument gives 

(21) gn,k^^^^\-^s\\^a-s\ 



I 

\-^a I 

Let hP''^{z) denote the density function of the gaussian distribution with 
mean and variance k{n — k)/n. By Theorem 13.21 and the inequality 
exp \x\ < exp(x) + exp(— x), we see that there exists a joint density function 
ipa'^is, z) on Z X R such that 

(22) j z) dz = <7^''=(s), j ra\s, z) ds = h^'Hz), 
and for all < 6* < 6*0, 

(23) J exp(^^s-^-z^Va'^(s,z)dsdz<exp(^l + ^^^. 

Now define a function : Z x E x Z'^+i x R''+^ x ^"-^=+1 x W'-''+^ M 
as follows: 

(24) 7a"(^, z, s, z, s', z') := z)p>;{s, z)p^r,^(s', z'). 

By integrating over s',z', then s,z, and finally s,z, it is easy to verify that 
7^ is a probability density function (if either o or s is not an allowed value, 
then ilja'^{s, z) = 0, so there is no problem). 

Let (5, S, Z, S', Z') denote a random vector following the density 7^. 
In words, this means the following: We are first generating (5, Z) from the 
joint distribution ij^a' ; given 5 = s, Z = z, we are independently generating 
the pairs (S, Z) and (S', Z') from the joint densities and p^Z^ respectively. 

Now define two random vectors Y G M"^-'^ and U G Z'"^-'^ as follows. For 
i < k, let 



and for i > /c, let 



1 

Yi = Zi + —Z, 
k 



Y -7' + " V 



Note that the two definitions match at i = k because = Zq = 0. Next, 
define Ui = Si for i < k and Ui = S + S[_^ for i>k. Again, the definitions 
match at i = k because Sk = S and = 0. We claim that the joint density 
of (U, Y) is a valid candidate for The claim is proved in several steps. 



1. Marginal distribution o/U. From equations (j20p and (j22p it is easy to 
see that 

J 7,"(5, z, s, z, s', z') dz dz' dz = g2'Hs)fH^)f:-si^')- 

In other words, the distribution of the triplet (5,8,8') can be described 
as follows: Generate S from the distribution of Sk given Sn = a; then 
independently generate 8 and S' from the conditional distributions fg and 
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fa-s- should now be intuitively clear that U has marginal density 
Still, to be completely formal, we apply equations and ([2T]) to get 



and observe that there is a one-to-one correspondence between (S, S, S') and 
U, and U can take any value in A^. 

2. Marginal distribution of Y. First, we claim that Z, Z, and Z' are 
independent with densities /i"''^, cf)^, and (/>" respectively. Again, using (pOj) 
and (|22p . this is easily seen as follows. 

lrAs,^^^^^,s',.')ds'dsds = lr,Hs,.)^^^^^^^^ 

= r-\z) [ ra{s,z)pl{s,7.)d^ds 



= (/>"-'=(z')0'=(z) j ^:{s,z)ds 

= 0"-^(z')</>^(z)/i"''=(z). 

Thus, Y is a gaussian random vector with mean zero. It only remains to 
compute Cov(yj, Yj). Considering separately the cases i<j<k, k<i<j, 
and i < k < j, it is now straightforward to verify that Cov{Yi,Yj) = i{n — 
j)/n in each case. Thus, Y ~ 0". 

3. The exponential bound. For < i < n, let 

ia 

Wi = Ui- -. 



n 



We have to show that for < A < Aq, 



E exp(A max | Wj — | ) < exp ( C log n + 



i<n 



KX^a 



n 



where C, K, and Aq are as in (jl7p . Now let 



Tl := max 

i<k 



Si Zi 

k 



, Tpi := max 

i>k 



i — k 
n — k 



{a-S)-Zl 



and 

We claim that 
(25) 



T :-- 



S-^-^-Z 
n 



max \Wi-Yi\< max{rL, Tr} + T. 
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To prove this, first take any i < k. Then 



25 



\Wi - YA 



< 



n 



Si — '-. Zi 

k 



iZ 

Z^ + — 



+ 



iS ia iZ 



k n 



k 



Similarly, for i > k 

m-Yi\ = 
< 



<Tl + tT <Tl + T. 



n \ n — k 



S'_ 



+ 



i — k 
n — k 



(a - 5) - Zi 



S+^(a-S)-'^-^Z 



S' 



i—k 



n — k 
i — k 
n — k 



a-S)-Z[ 



n n — k 
n — i 



+ 



n — k 



S-'^-Z 
n 



<Tr + T. 



This proves (j25p . Now fix A < Aq. Using the crude bound exp(x V y) < 
exp X + exp 7/, we get 

(26) exp(A max I - ^il) < exp(ArL + AT) + exp(ATR + AT). 

Now, by the construction (f24l) . it is easy to check that given (S, Z) = (s, z), 
the conditional density of (S,Z) is simply /oj. By the induction hypothesis, 
this implies that 



E(exp(ArL)|5,Z) < exp C\ogk + 



k 



It is easy to see that the moment generating functions of both Tl and T are 
finite everywhere, and hence there is no problem in applying the Cauchy- 
Schwarz inequality to get 



Eexp(ArL + Ar) < [E(E(exp(ArL)|5,Z)2)E(exp(2AT))] 



1/2 



< exp(Clog k) 



Eexp 



Eexp(2Ar) 



1/2 



We wish to apply Lemma 13.51 to bound the first term inside the bracket. 
Observe that by (fT7|) . we have 

16c 
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and also n/3 < k < 2n/3 by assumption. Hence Lemma 13.51 can indeed be 
applied to get 

Eexp^^— j<exp^l + ^^j. 



Next, note that by (1171) . 2A < Oq- Hence by inequality (123]) with 6 = 2A, we 
get the bound 

Eexp(2AT) < exp 1 + 



n 

Combining the last three steps, we have 

E exp(ArL + AT) < exp (^C\ogk + l+ 

Now, by ([nD, 3K + 8c = AK. Again, since n/3 < A; < 2n/3, we have 
log k = log n — log(n/A;) < log n — log(3/2). 

Thus, 

E exp(ATL + AT) < 2^/^ g^p (clogn-C log(3/2) + 1 + ^ j . 

By the symmetry of the situation, we can get the exact same bound on 
Eexp(ATR + AT). Combined with (p6|) . this gives 

Eexp(Amax|VF'i -Yil) < 2exp( Clogn - C71og(3/2) + 1 



i<n y n 

Finally, from the condition on C in (|17p . we see that 

-Clog(3/2) + 1 + log2 < 0. 

This completes the induction step. To complete the argument, we just 
choose C so large and Aq so small that the result is true for n = 2 even if 
the vectors (M^O; W^ij 1^2) and {Zq, Zi, Z2) are chosen to be independent of 
each other. □ 

5. Completing the proofs of the main theorems 

In this final section, we put together the pieces to complete the proofs of 
Theorem ll.4l and Theorem II. 5 i The following lemma combines Theorem 14. II 
and Theorem 13. II to give a 'finite n version' of Theorem 11.51 

Lemma 5.1. There exist universal constants B > I and A > such that 
the following is true. Let n be a positive integer and let ei,e2, ■ ■ ■ ,£n be 
i.i.d. symmetric ±1 random variables. Let Sk = Yli=i^i> ^ = 0,1,..., n. 
It is possible to construct a version of the sequence {Sk)k<n o,nd gaussian 
random variables {Zk)k<n with mean and Cov(Zj, Zj) = i Aj on the same 
probability space such that Eexp(A|5n — Zn\) < B and 

Eexp(Amax |5fc — Zfcl) < i?exp(i?logn). 

k<n 
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Proof. Recall the universal constants and k from Theorem 13.11 and C, K, 
and Ao from Theorem 14. 1[ Choose A so small that 



A < and < 1. 

Let the probability densities p", and (j)'^ be as in the proof of Theo- 
rem 14.11 Let and denote the densities of Sn and Zn respectively. By 
Theorem 13.11 and the choice of A, there is a joint density ip'^ on Z xM such 
that 

J ris, z) dz = j ris, z) ds = h^{z), 

and 

(27) j exp{2X\s - z\)i;"'{s,z)dsdz < K. 

Now define a function 7" : Z x M x x M"+i ^ M as 

7"(s,z,s,z) := '0"(s,zK(s,z). 

It is easy to check that this is a probability density function. Let {S, Z, S, Z) 
be a random vector following this density. As in the proof of Theorem 14. H 
an easy integration shows that the joint density of {Z, Z) is simply 

/i"(z)(/)"(z). 

Define a random vector Y = (Yq, . . . , Yn) as 

Yi = Z, + -Z. 

n 

By the independence of Z and Z and their distributions, it follows that Y 
is a mean zero gaussian random vector with Cov(li, Yj) = i A j. 

Next, integrating out z and z we see that the joint density of (5, S) is 

5"(^)/r(s). 

Elementary probabilistic reasoning now shows that the marginal distribution 
of S is the same as that of a simple random walk up to time n. 

Let us now show that the law of the pair (S, Y) satisfies the conditions 
of the theorem. First, let Wi = Si — iS/n. Note that for any i < n, 



Si Yi 



S^-[Z, + -Z 
n 



< \Wi - Z,\ + -\S - Z\ 
n 



Note that the conditional distribution of (S, Z) given (5", Z) = (s, z) is simply 
p^. Since A < Aq, we have by the construction of that 



E(exp(Amax|Wj - ZMS.Z) < expf Clogn + 

i<n \ 



KX^S^ 
n 



< exp(Clogn) [KEexp(2i^A25Vn)] 
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Thus, using the Cauchy-Schwarz inequahty and (i27|) . we can now get 
E exp(A max \ Si — Yi\) 

i<n 

< [E(E(exp(Amax|VFi -Zi|)|5,Z)^)Eexp(2A|5-Z|)]^/^ 

1/2 

By inequahty (|13p and the choice of A, the proof of the maximal inequahty 
is done. For the other inequahty, note that we have (j27p and Yn = Z since 

Zn = 0. □ 

Proofs of Theorems 11.41 and II. 5L The proof of Theorem 11.41 fohows trivially 
from Theorem 14.11 The proof of Theorem 11.51 also follows quite easily from 
Lemma fS.H but some more work is required. We carry out the few remaining 
steps below. 

For r = 1,2,... let rrir = 2^"^, and = nir — rrir^i. For each r 

(r) (r)\ 

[Sj^ ,Zj^ )o<k<nr be a random vector satisfying the conclusions of Lemma 
15.11 and suppose these random vectors are independent. Inductively define 



an infinite sequence {Sk, Zk)k>o as follows. Let Sk = sj^'^ and Zk = Z^'^ for 
k < mi. Having defined {Sk, Zk)k<mr-iJ define {Sk,Zk) 

mr~i<k<mr as 



Sk '■— Sp._^ + Sriir-i, Zk :— Zj^_ + Zm^_-^. 



Clearly, since the increments are independent, Sk and Zk are indeed random 
walks with binary and gaussian increments respectively. 

Now recall the constants B and A in Lemma [5.11 First, note that for each 
r, by Lemma l5. II and independence we have 



(28) 

Next, let 



Eexp(A|S„, - ZmA) < Eexp(xj2\sS " 

= nEexp(A|5W-ZW|)<5^ 



C 



1 



cxp(-|g log4) 



1 



B 

We will show by induction that for each r, 

(29) Eexp(A max \Sk - Zk\) < C B'' exp{B log rur) . 

k<mr 

By Lemma 1 5. II and the facts that B > 1 and C > 1, this holds for r = 1. 
Suppose it holds for r — 1. By the inequality exp(x V y) < exp x + exp y, we 
have 

Eexp(A max 15"^ — Zk\) < Eexp(A max \Sk — Zk\) 

k<mr mr—i<k<mr 

+ Eexp(A max \Sk — Zk\). 

k<mr-i 
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Let US consider the first term. We have 

max \Sk- Zk\ < max \S^p - z'f^ + \Smr-i - Zmr-i\- 

Thus, by independence and Lemma WA\ and the inequahty ([28|) . we get 
Eexp(A max \Sk — Z]^\) < eyi'p[B\ogmr). 

mr-i<k<.mr 

By the induction hypothesis and the relation = Tn'^_^, we see that the 
second term in (|3Up has the bound 

Eexp(A max \Sk — Zk\) < CB^~^ exp{B log rrir-i) 

k<mr—i 



/Slog 



nir 



= CB'-^expl 
Combining, we get 

/ C f B log \ \ 
Eexp(A max \Sk — Zk\) < B^ exp{B log m^) I 1 + — exp ( I I . 

k<mr \ B \ 2 J J 

From the definition of C, it easy to verify (since nir > 4), that the term 
within the parentheses in the above expression is bounded by C. This com- 
pletes the induction step. 

So we have now shown (I29p . Since r < const. lognir, this shows that 
there exists a constant K such that for all r, 

Eexp(A max \Sk — Zk\) < K exp{K logrrir). 

k<mr 

Now let us prove such an inequality for arbitrary n instead of nir- Take any 
n > 2. Let r be such that rn-r-i < n < rrir. Then rrir = m^-i — ■ Thus, 

Eexp(Amax|5fc — Z^l) < Eexp(A max 15";^ — Zy}\) 

k<n k<mr 

< K exp{K logrrir) < exp(2i^ logn). 
It is now easy to complete the argument using Markov's inequality. □ 
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